Cloud computing’s appeal lies in its dynamic and flexible Service Level Agreement (SLA) based negotiable services, allowing users to access virtually limitless computing resources [1]. According to the National Institute of Standards and Technology (NIST), cloud computing offers a swiftly provisioned pay-per-use model, enabling on-demand, accessible, and configurable network access to shared pool resources, requiring minimal interactions from service providers and reduced management efforts [2]. Cloud computing models include private, public, hybrid, and community clouds, with services categorized into Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). IaaS providers like Google Compute Engine, Windows Azure Virtual Machines, and Amazon Elastic Cloud Compute offer network resources and computing storage, enhancing performance and reducing maintenance costs to meet specific customer demands [3, 4]. This evolution in cloud computing has transformed various sectors. Businesses and healthcare organizations benefit from services like cost reduction through resource outsourcing [3, 4], performance monitoring [5, 6], resource management [7], and computing prediction [8]. Additionally, cloud computing facilitates tasks such as resource allocation [9], workload distribution [10,11,12], capacity planning [13], and job-based resource distribution [14, 15]. This transformative impact underscores the significance of cloud computing in modern digital landscapes, empowering organizations with unprecedented efficiency and scalability in resource utilization [3,4,5,6,7,8,9,10,11,12,13,14,15].
Despite the availability of various data services, data owners are apprehensive about entrusting their valuable data to cloud service providers (CSPs) for third-party cloud storage due to concerns about the integrity of the CSPs [13, 16, 17], and the shared nature of cloud storage environments. Cloud computing primarily encompasses data storage and computation, with Infrastructure as a Service (IaaS) closely linked to cloud storage. When accessing IaaS, cloud users often lack visibility into the precise location of their outsourced data within the cloud storage and the machines responsible for processing tasks. Consequently, data privacy within cloud storage is a significant security challenge, exacerbated by the presence of malicious users, resulting in data integrity and confidentiality issues. This poses a critical security challenge for cloud storage, and trust in remote cloud data storage is crucial for the success of cloud computing. Data integrity, encompassing completeness, correctness, and consistency, is vital in the context of Database Management Systems (DBMS) and the ACID (Atomicity, Consistency, Isolation, Durability) properties of transactions. The issue arises when CSPs cannot securely guarantee clients the accuracy and completeness of data in response to their queries [18].
Researchers are actively advancing the field of data integrity in cloud computing by refining data integrity verification techniques and bolstering data privacy-preserving methods. These verification techniques primarily encompass Proof of Work (PoW), Proof of Data Possession (PDP), and Proof of Retrievability (PoR). Notably, the introduction of Message Authentication Code (MAC) using a unique random key within the data integrity framework marked a deterministic approach to data integrity verification, mitigating the inefficiencies associated with remote data integrity schemes that employed RSA-based encryption. This approach addressed issues related to significant computation time and long hash value transfer times for large files [19]. To enhance the security of data integrity schemes, Provable Data Possession (PDP) concepts were introduced to establish the legitimacy of data possession by a cloud server. Various subsequent research efforts have continually refined these algorithms, introducing innovations like the Transparent PDP scheme [20], DHT-PDP [21], Certificateless PDP Protocol for Multiple Copies [22,23,24], and Dynamic Multiple-Replica PDP [25]. Concurrently, the Proof of Retrievability (PoR) concept was introduced in 2007 to address error localization and data recovery issues [26]. Additionally, Proof of Original Ownership (PoW) emerged in 2011 through the Merkle hash tree protocol to prevent malicious adversaries, leading to a plethora of subsequent research endeavors with diverse improved algorithms aimed at the same goals [27,28,29].
Fully homomorphic encryption (FHE) was proposed to maintain the privacy preservation of outsourced data and in that case, original data were converted into ciphertext through an encryption technique that supports multiplication and additional operation over the ciphertext [30]. Meanwhile, drawbacks in [22] such as practically infeasible due to complex operations, were then solved by [31] Somewhat Homomorphic Encryption (SHE) scheme. Many more research works have been established in these few years such as biometrics face recognition approach [32], privacy-preserving auditing scheme for Cloud Storage using HLA [33], An Etiquette Approach for Preserving Data [34], etc.
Recently, Google cloud has introduced Zebra technologies based on a security command center (SCC) and security operation center (SOC) to point out some harmful threats such as crypto mining activity, data exfiltration, potential malware infections, brute force SSH attacks, etc. to maintain data integrity of business organization’s information [35].
In recent years, numerous cloud data integrity schemes have emerged, along with several survey papers, albeit with limited parameters to comprehensively address specific aspects of data integrity. Some of these surveys include data auditing from single copies to multiple replicas [36], Proof of Retrievability [37], various data integrity techniques and verification types for cloud storage, and different data integrity protocols [38]. However, these surveys often fall short in providing a comprehensive understanding of data integrity strategies and their classification. A concise taxonomy of data integrity schemes was presented in a survey paper [39], which discussed a comparative analysis of existing data integrity schemes, their evolution from 2007 to 2015, and covered fewer physical storage issues, fewer security challenges, and design considerations. This survey paper aims to address this gap by offering an in-depth discussion on the security challenges within physical cloud storage, potential threats, attacks, and their mitigations. It will also categorize data integrity schemes, outline their phases and characteristics, provide a comparative analysis, and project future trends. This comprehensive approach underscores the significance of data integrity schemes in securing cloud storage.
DiscussionAlthough there are several articles arise on similar issues, our research work differs from all mentioned research works in the following ways: Unlike [36, 37, 39], our research work focused on different types of storage-based attacks and also comprised up-to-date methods to resist storage-based attacks which always violate data integrity schemes on physical cloud storage. Like [37], it includes storage-based security issues, threats, and it’s existing mitigation solutions. Unlike [36, 37, 39] our research work focused on the different types of proposals of data integrity verification which is broadly classified into file-level verification, entire blocks verification, metadata verification, and randomly block-level verification.
Unlike [37], our survey work is not constricted to only proof of retrievability (POR). It covers all verification types like the power of ownership (PoW), proof of retrievability (POR), and provable data possession (PDP). It also includes different types of auditing verifications techniques to elaborate job roles on the TPA’s side and DO’s side. It also includes a discussion of the benefit of public auditing to reduce the overhead of computational and communication overhead of DO. Unlike [36,37,38, 40,41,42,43], our survey work reviews a wide range of quality features of data integrity schemes that have individually prime importance in cloud storage security. Unlike [36, 37, 41], we focused on different types of security challenges according to existing symptoms, effects, and probable solutions of data integrity schemes. Like [42,43,44], we include a discussion about malicious insider attacks, forgery attacks, and dishonest TPA and CSP. Unlike [41, 43, 44], in comparative analysis, we introduce here different performance analysis parameters of existing works based on the work’s motivations and limitations in addition to a discussion of public and private data auditing criteria. Like [32], we include all existing data integration methods briefly in the Comparative analysis of data integrity strategies section.
Research gapAccording to the above discussion, this research focuses on the following points to summarize the research gaps:
In contrast to [36, 37, 39], our research included current strategies to fend against storage-based attacks, which consistently compromise data integrity techniques on physical cloud storage.
Our research, in contrast to [36, 37, 39], concentrated on the various approaches to data integrity verification, which is categorised into four categories: file-level verification, full block verification, metadata verification, and randomized block-level verification.
Our survey study is not limited to proof of retrievability (POR), in contrast to [37]. It includes all forms of verification, including proven data possession (PDP), proof of retrievability (POR), and power of ownership (PoW). Different Key Management Techniques used in cloud storage to improve security at cloud storage were also added here .
In contrast to [36,37,38, 40,41,42,43], our survey work examines a variety of data integrity scheme quality features, each of which is crucial to the security of cloud storage.
In contrast to [36, 37, 41], we concentrated on various security issues based on the impacts, symptoms, and likely fixes of data integrity techniques.
In contrast to [41, 43, 44], we present here various performance analysis parameters of previous efforts based on the goals and constraints of the work together with a discussion of auditing criteria for both public and private data.
ContributionOn the basis of our knowledge, this is the first attempt to overlook all the related issues of cloud data storage with possible directions under a single article. The Key contributions of this research paper are summarized below:
Identification of possible attacks on storage level services which may arise on physical cloud storage mitigating explored solutions
Summarizing of possible characteristics of data integrity strategies to examine data integrity auditing soundness, phases, classification, etc. to understand and analyse security loopholes
Literature review on comparative analysis based on all characteristics, motivation, limitation, accuracy, method, and probable attacks
Discussion on design goal issues along with security level issues on data integrity strategy to analyse dynamic performance efficiency, different key management techniques to achieve security features, to analyse server attacks, etc.
Identification of security issues in data integrity strategy and its mitigation solution
Discussion about the future direction of new data integrity schemes of cloud computing.
This review article is described in 8 sections. Issues of physical cloud storage section, discusses issues of physical cloud storage, and attacks in storage level service. Key management techniques with regards to storage level in cloud section describes some existing key management techniques to enhance security of cloud storage. Potential attacks in storage level service section describes possible potential attacks in cloud storage. Phases of data integrity technique section phases of the data integrity scheme and summarizes all possible characteristics of the data integrity strategy. Classification of data integrity strategy section describes a classification of data integrity strategy. Characteristics of data integrity technique section describes characteristics of data integrity technique. Challenges of data integrity technique in cloud environment section describes Challenges of data integrity technique in cloud Environment. Desire design challenges of data integrity strategy section describes Desire design challenges of data integrity strategy. Comparative analysis of data integrity strategies section represents a comparative analysis of existing research works of data integrity strategy. At the end,design goal issues and future trends of cloud storage based on existing integrity schemes using a timeline infographic from 2016 to 2022 in Future trends in data integrity approaches section.